ntegrated segmentation and recognition of onnected Ottoman script

نویسنده

  • Zeki Yalniz
چکیده

smet Zeki Yalniz smail Sengor Altingovde ğur Güdükbay zgür Ulusoy ilkent University epartment of Computer Engineering ilkent, Ankara, 06800 urkey -mail: [email protected] Abstract. We propose a novel context-sensitive segmentation and recognition method for connected letters in Ottoman script. This method first extracts a set of segments from a connected script and determines the candidate letters to which extracted segments are most similar. Next, a function is defined for scoring each different syntactically correct sequence of these candidate letters. To find the candidate letter sequence that maximizes the score function, a directed acyclic graph is constructed. The letters are finally recognized by computing the longest path in this graph. Experiments using a collection of printed Ottoman documents reveal that the proposed method provides 90% precision and recall figures in terms of character recognition. In a further set of experiments, we also demonstrate that the framework can be used as a building block for an information retrieval system for digital Ottoman archives. © 2009 Society of Photo-Optical Instrumentation Engineers. DOI: 10.1117/1.3262346

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ntegrated segmentation and recognition of onnected Ottoman script smet

smet Zeki Yalniz smail Sengor Altingovde ğur Güdükbay zgür Ulusoy ilkent University epartment of Computer Engineering ilkent, Ankara, 06800 urkey -mail: [email protected] Abstract. We propose a novel context-sensitive segmentation and recognition method for connected letters in Ottoman script. This method first extracts a set of segments from a connected script and determines the candi...

متن کامل

Printed Text Recognition System for Multi-Script Image

Optical Character Recognition system provides transformation of input text into editable form. Multi-script recognition systems are requisite in the countries like India where different people speak different languages in numerous states of country. In the recent time, multi-script recognition is a demanding problem and research work for expansion of optical character recognition scheme for cla...

متن کامل

Segmentation of Offline Handwritten Bengali Script

Character segmentation has long been one of the most critical areas of optical character recognition process. Through this operation, an image of a sequence of characters, which may be connected in some cases, is decomposed into sub-images of individual alphabetic symbols. In this paper, segmentation of cursive handwritten script of world’s fourth popular language, Bengali, is considered. Unlik...

متن کامل

A Survey on Script Segmentation for Bangla OCR

Script segmentation is an important primary task for any Optical Character Recognition (OCR) software. Especially, in case of off-line OCR for printed character, it has more importance. Through script segmentation a big image of some written document is fragmented into a number of small pieces which are then used for pattern matching to determine the expected sequence of characters. In the impl...

متن کامل

Review: A Literature Survey on Text Segmentation in Handwritten Punjabi Documents

Gurumukhi script is used for Punjabi language, which is a two dimensional composition of symbols with connected and disconnected diacritics. Handwritten Gurumukhi script has some complexities like connected, overlapped text lines, words and characters. It is one of the foremost issues for errors during the recognition process. Text segmentation is a challenging job in unconstrained writer indep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009